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Abstract. This paper extends a recently proposed model for combinatorial land- 
scapes: Local Optima Networks (LON), to incorporate a first-improvement (greedy- 
ascent) hill-climbing algorithm, instead of a best-improvement (steepest-ascent) 
one, for the definition and extraction of the basins of attraction of the landscape 
optima. A statistical analysis comparing best and first improvement network mod- 
els for a set of NK landscapes, is presented and discussed. Our results suggest 
structural differences between the two models with respect to both the network 
connectivity, and the nature of the basins of attraction. The impact of these dif- 
ferences in the behavior of search heuristics based on first and best improvement 
local search is thoroughly discussed. 

1 Introduction 

The performance of heuristic search algorithms crucially depends on the structural as- 
pects of the spaces being searched. An improved understanding of this dependency, can 
facilitate the design and further successful application of these methods to solve hard 
computational search problems. Local optima networks (LON) have been recently in- 
troduced as a novel model of combinatorial landscapes [6 8 9]. This model allows the 
use of complex network analysis techniques Q in connection with the study of fitness 
landscapes and problem difficulty in combinatorial optimisation. The model, inspired 
by work in the physical sciences on energy surfaces (3j, is based on the idea of com- 
pressing the information given by the whole problem configuration space into a smaller 
mathematical object which is the graph having as vertices the optima configurations of 
the problem and as edges the possible weighted transitions between these optima (see 
Figure Q]). This characterization of landscapes as networks has brought new insights 
into the global structure of the landscapes studied, particularly into the distribution of 
their local optima. Moreover, some network features have been found to correlate and 
suggest explanations for search difficulty on the studied domains. The study of local 
optima networks has also revealed new properties of the basins of attraction. 

The current methodology for extracting LONs requires the exhaustive exploration 
of the search space, and the use of a best-improvement (steepest-ascent) local search 
algorithm from each configuration. In this paper, we are interested in exploring how the 
network structure and features of a given landscape will change, if a first-improvement 
(greedy-ascent) local search algorithm is used instead for extracting the basins and tran- 
sition probabilities. This is apparently simple but, in reality, requires a careful redefi- 
nition of the concept of a basin of attraction. The new notions will be presented in the 
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Fig. 1. Visualisation of the weighted local optima network of a small NK landscape 
(N = 6, K = 2). The nodes correspond to the local optima basins (with the diameter 
indicating the size of basins, and the label "fit", the fitness of the local optima). The 
edges depict the transition probabilities between basins as defined in the text. 



next section. Following previous work 118191 . we use the well-known family of NK 
landscapes (4] as an example, as it allows the exploration of landscapes of tunable 
ruggedness and search difficulty. 

The article is structured as follows. Section|2j includes the relevant definitions and 
algorithms for extracting the LONs. Section [3] describes the experimental design, and 
reports the analysis of the extracted networks, including a study of both their basic 
features and connectivity, and the nature of the basins of attraction of the local optima. 
Finally, section|4]discusses our main findings and suggest directions for future work. 

2 Definitions and algorithms 

A Fitness landscape Q is a triplet (S, V, /) where S is a set of potential solutions i.e. 
a search space, V : S — !• 2 s , a neighborhood structure, is a function that assigns to 
every s e S a set of neighbors V(s), and / : S — > R is a fitness function that can be 
pictured as the height of the corresponding solutions. In our study, the search space is 
composed by binary strings of length N, therefore its size is 2 N . The neighborhood is 
defined by the minimum possible move on a binary search space, that is, the 1-move or 
bit-flip operation. In consequence, for any given string s of length N, the neighborhood 
size is | V(s) | = N. The HillClimbing algorithm to determine the local optima and 
therefore define the basins of attraction, is given in Algorithm Q] It defines a mapping 
from the search space S to the set of locally optimal solutions S*. 

First-improvement differs from best-improvement local search, in the way of select- 
ing the next neighbor in the search process, which is related with the so-called pivot- 
rule. In best-improvement, the entire neighborhood is explored and the best solution is 



Algorithm 1 Best-improvement (left) and first-improvement (right) algorithms. 



Choose initial solution s G S 
repeat 

choose s 6 V(s), such that f(s ) — 

max xev{s) f(x) 

if f(s) < f(s') then 

s <S— s 
end if 

until s is a Local optimum 



Choose initial solution s£S 
repeat 

choose s G V(s) using a predefined ran- 
dom ordering 

if/(s) < /(a') then 

s <— s 
end if 

until s is a Local optimum 



returned, whereas in first-improvement, a solution is selected uniformly at random from 
the neighborhood (see Algorithm [TJ. 

First, let us define the standard notion of a local optimum. 

Local optimum (LO). A local optimum, which is taken to be a maximum here, is 
a solution s* such that Vs G V(s), f(s) < f(s*). 

Let us denote by h, the stochastic operator that associates to each solution s, the 
solution obtained after applying one of the hill-climbing algorithms (see Algorithms 
[TJ for a sufficiently large number of iterations to converge to a LO. The size of the 
landscape is finite, so we can denote by LO\, LO2, LO3 . . . , LO p , the local optima. 
These LOs are the vertices of the local optima network. 

Now, we introduce the concept of basin of attraction to define the edges and weights 
of our network model. Note that for each solution s, there is a probability that h(s) = 
LOi. We denote Pi(s) the probability P(h(s) = LOi). We have that for: 

Best-improvement: for a given solution s, there is a (single) local optimum, and thus 

an i, such that p,(s) = 1 and Vj 7^ i,Pj(s) = 0. 
First-improvement: for a given solution s, it is possible to have several local optima, 

and thus several i\, i 2 , ■ ■ ■ , i m , such that p ix (s) > 7 pi 2 (s) > 0, . . . ,pi m (s) > 0. 

For both models, we have, for each solution s G S, Y^h=i Pi( s ) = !■ 
Following the definition of the LON model in neutral fitness landscapes J9), we 
have that: 

Basin of attraction. The basin of attraction of the local optimum i is the set 6, = 

{s G S Pi(s) > 0}. This definition is consistent with our previous definition J8) for 
the best-improvement case. 

The size of the basins of attraction can now be defined as follows: 

Size of a basin of attraction. The size of the basin of attraction of a local optimum 

Edge weight. We first reproduce the definition of edge weights for the non-neutral 
landscape, and best-improvement hill-climbing (8): For each solutions s and s , let 
p(s — > s ) denote the probability that s is a neighbor of s, i.e. s G V(s). Therefore, 
we define below: p(s — > bj), the probability that a configuration s G S has a neighbor 



in a basin bj, and p(bi —> bj), the total probability of going from basin bi to basin bj, 
which is as the average over all s G bi of the transition probabilities to solutions s E bj 
(where jj&j is the size of the basin bi) : 



p(s —} bj) = P( s ^ s ')> 



p{h -►&,•) = ^- ^P{s -> bj) 

* 1 seb z 



For first and best improvement hill-climbing, we have defined the probability pi(s) 
that a solution s belongs to a basin i. We can, therefore, modify the previous definitions 
to consider both types of network models: 



In the best-improvement, we have Pk{s) = 1 for all the configurations in the basin b^. 
Therefore, the definition of weights for the best-improvement case is consistent with 
the previous definition. Now, we are in a position to define the weighted local optima 
network: 

Local optima network. The weighted local optima network G w = (N, E) is the 
graph where the nodes are the local optima, and there is an edge ey G E, with weight 

w-ij = p(bi — > bj), between two nodes i and j if p(pi —tbj) > 0. 

According to our definition of edge weights, Wij = p{bi — > bj) may be different 
than Wji = p(bj — > bi). Thus, two weights are needed in general, and we have an 
oriented transition graph. 

3 Analysis of the local optima networks 

The NK family of landscapes [@) is a problem-independent model for constructing 
multimodal landscapes that can gradually be tuned from smooth to rugged. In the 
model, N refers to the number of (binary) genes in the genotype (i.e. the string length) 
and K to the number of genes that influence a particular gene. By increasing the value 
of K from to N— 1, NK landscapes can be tuned from smooth to rugged. The K vari- 
ables that form the context of the fitness contribution of gene s, can be chosen according 
to different models. The two most widely studied models are the random neighborhood 
model, where the K variables are chosen randomly according to a uniform distribu- 
tion among the n — 1 variables other than Si, and the adjacent neighborhood model, 
in which the K variables that are closest to s, in a total ordering si, S2, ■ ■ ■ , s n (using 
periodic boundaries). No significant differences between the two models were found in 
Q in terms of the landscape global properties, such as mean number of local optima 
or autocorrelation length. Similarly, our preliminary studies on the characteristics of 
the NK landscape optima networks, did not show noticeable differences between the 
two neighborhood models. Therefore, we conducted our full study on the more general 
random model. 
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In order to minimize the influence of the random creation of landscapes, we consid- 
ered 30 different and independent landscapes for each combination of N and K param- 
eter values. In all cases, the measures reported, are the average of these 30 landscapes. 
The study considered landscapes with N 6 {14, 16} and A' € {2, 4, . . . , N — 1}, which 
are the largest possible parameter combinations that allow the exhaustive extraction of 
local optima networks. Both best-improvement and first-improvement local optima net- 
works (b-LON and f-LON, respectively) were extracted and analyzed. 

3.1 Network features and connectivity 

This section reports the most commonly used features to characterise complex net- 
works, in both the f-LON and b-LON models. 

Table 1. NK landscapes network properties. Values are averages over 30 random in- 
stances, standard deviations are shown as subscripts. n v and n e represent the number 
of vertexes and edges, C w , the mean weighted clustering coefficient. Y represent the 
mean disparity coefficient, d the mean path length, and dbest the mean path length to 
the global optimum (see text for definitions). 
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Number of nodes and edges: The 2 nd column of Table Q] reports the number of 
nodes (local optima), n v , for all the studied landscapes. The b-LONs and f-LONs have 
the same local optima, since both local search algorithms, although using a different 
pivot-rule, are based on the bit-flip neighborhood. The networks, however, have a dif- 
ferent number of edges, as can be appreciated in the 3 rd and 4 th columns of Table Q] 



which report the number of edges normalized by the square of the number of nodes. 
Clearly, the number of edges is much larger for the f-LONs. This number is always the 
square of the number of nodes, which indicates that the f-LONs are complete graphs. It 
is worth noticing, however, that many of the edges have very low weights (see Figure 
|3}. For the b-LON model, the number of edges decrease steadily with increasing values 
of K. 

Clustering coefficient or transitivity: The clustering coefficient of a network is the 
average probability that that two neighbors of a given node are also neighbors of each 
other. In the language of social networks, the friend of your friend is likely also to be 
your friend. The standard clustering coefficient [5] does not consider weighted edges. 
We thus used the weighted clustering measure proposed by [ 1 1 . The 5 th column of table 
Q] lists the average coefficients of the b-LONs for all N and K. It is apparent that the 
clustering coefficients decrease regularly with increasing K, which indicates that either 
there are less transitions between neighboring basins for high K, and/or the transitions 
are less likely to occur. On the other hand, the f-LONs correspond to complete networks; 
the calculation of the clustering coefficients revealed that Vi, c w (i) — 1.0 (not shown 
in the Table). Therefore, the f-LON is densely connected for all values of K. 

Disparity: The disparity measure proposed in Q], Y(i), gauges the heterogeneity 
of the contributions of the edges of node i to the total weight. Columns 6 th and 7 th 
in Table Q] depict the disparity coefficients, for both network models, respectively. The 
heterogeneity decreases with increasing values of K. This reflects that with high values 
of K, the transitions to other basins tend to become equally likely, an indication of a 
more random structure (and thus a difficult search). It can also be seen that the weights 
for the f-LON model are less heterogenous (more uniform) than for the b-LON one. 

Shortest path length: Another standard metric to characterize the structure of net- 
works is the shortest path length (number of link hobs) between two nodes on the net- 
work. In order to compute this measure on the optima network of a given landscape, 
we considered the expected number of bit-flip mutations to pass from one basin to the 
other. This expected number can be computed by considering the inverse of the tran- 
sition probabilities between basins. More formally, the distance between two nodes is 
defined by dij = 1/wij where Wij = p(h — > bj). Now, we can define the length 
of a path between two nodes as being the sum of these distances along the edges that 
connect the respective basins. Columns 9 th and 7 th in Table Q] report this measure on 
the two network models. In both cases, the shortest path increases with K, however, 
for the b-LON the growth stagnates for larger K values. The paths are considerably 
longer for the f-LON, with the exception of the lowest values of K. Some paths are 
more relevant from the point of view of a stochastic local search algorithm following 
a trajectory over the maxima network. Therefore, columns 10 th and 11 th in Table Q] 
report the shortest path length to the global optimum from all the other optima in the 
landscape. The trend is clear, the path lengths to the optimum increase steadily with 
increasing K, and similarly, the first-improvement network shows longer paths. This 
suggest that a larger number of hops will be needed to find the global optimum when a 
first-improvement local search is used. We must consider, however, that the number of 
evaluations needed to explore a basin, would be N times lower for first-improvement 
than for best-improvement. 



Outgoing weight distribution: The standard topological characterization of (un- 
weighed) networks is obtained by its degree distribution. The degree of a node is defined 
as its number of neighbours, and the degree distribution of a network is the distribution 
over the frequencies of different degrees over all nodes in the network. For weighted 
networks, a characterization of weights is obtained by the connectivity and weight dis- 
tributions p in (w) and p out (w) that any given edge has incoming or outgoing weight w. 
In our study, for each node i, the sum of outgoing edge weights is equal to 1 as they 
represent transition probabilities. So, an important measure is the weight Wu of self- 
connecting edges (remaining in the same node). We have the relation: wu + Sj = 1. 

Figure|2] reports the outgoing weight distributions p ou t{w) (in log-scale on x-axis) 
of both the f-LON and b-LON networks on a selected landscape with K = 6, and 
N = 16. One can see that the weights, i.e. the transition probabilities to neighboring 
basins are small. The distributions are far from uniform or Poissonian, they are not close 
to power-laws either. We couldn't find a simple fit to the curves such as stretched ex- 
ponentials or exponentially truncated power laws. It can be seen that the distributions 
differ for the first and best LON models. There is a larger number of edges with low 
weights for the f-LONs than for the b-LONs. Thus, even though the f-LONs are more 
densely connected (indeed they are complete graphs) many of the edges have very low 
weights. Figure [3] (left), shows the averages, over all the nodes in the network, of the 
weights wu (i.e. the probabilities of remaining in the same basin after a bit-flip mu- 
tation) for N = 16 and all the K values. Notice that, for both network models, the 
weights Wu are much higher when compared to those Wij with j ^ i (see Fig.[3]right). 
The wu are much lower for the first than for the best LON. In particular, in the b-LON, 
for K ~ 2, 50% of the random bit-flip mutations will produce a solution within the 
same basin of attraction, whereas this figure is of less than 20% in the f-LON. Indeed, 
in this case, for K greater than 4, the probabilities of remaining in the same basin fall 
below 10%, which suggests that escaping from local optima would be easier for a first- 
improvement local searcher. 
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Fig. 2. Probability distribution of the network weights for outgoing edges with j ^ i 
(in logscale on x-axis) for N = 16, K = 6. Averages on 30 independent landscapes. 
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Fig. 3. Averages of wu weights (left), and averages of Wij with j ^ i weights (right), 
for landscapes with N — 16 and all the K values. 



3.2 Basins of attraction features 

The previous section studied and compared the basic network features and connectivity 
of the first and best LONs. The exhaustive extraction of the networks, also produced 
detailed information of the corresponding basins of attraction. Therefore, this section 
discusses the most relevant of the basin's features. 

Size of the global optimum basin: When exploring the average size of the global 
optimum basin of the f-LONs, we found that they decrease exponentially with increas- 
ing ruggedness (K values). This is consistent with the results for the b-LON on these 
landscapes |8|. Moreover, the basins sizes for both networks are similar, with those of 
f-LON being slightly smaller. This may suggest that for the the same number of runs, 
the success rate of a first-improvement heuristic would be lower. One needs to consider, 
however, that the number of evaluations per run is smaller in this case. 

Basin sizes of the two network models: A comparative study of the basin sizes 
of the two network models revealed that they are highly correlated. Only the small- 
est basins of the f-LON model are larger in size when compared to the corresponding 
smallest basins in the b-LON model. 

Basin size and fitness of local optima: Fig. @] reports the correlation coefficients 
p between the networks' basin sizes and their fitness, for both the first and best LONs, 
and landscapes with N = 16 and all the K values. It can be observed that there is a 
strong correlation between fitness and basin sizes for both types of networks. Indeed, 
for K < 10, the correlation is over p > 0.8. For rugged landscapes, K > 8, the f-LON 
shows reduced and decreasing coefficients as compared to the b-LON. 

Number of basins per solution on the f-LONs: According to the definition of 
basins (see section|2]i, for the f-LON, a given solution may belong to a set of basins. Fig. 
|5](a) shows the average number of basins to which a solution belongs (i.e. §{i | p,(s) > 
0}). It can be observed that for A = 16 and K = 4,a solution belongs to nearly 70% of 
the total number of basins, whereas for K — 14, a solution belongs to less than 30% of 




Fig. 4. Average of the correlation coefficient between the fitness of local optima and 
their corresponding basin sizes on 30 independent landscapes for both f-LON and b- 
LON (N = 16, and all the K values). 



the total number of basins. On average, a solution belongs to less basins for high K than 
for low K. An exploration of the average number of basin per solution, according to the 
solution fitness value (Fig. 0(b), for N = 16) reveals a striking difference. While low 
fitness solutions belong to nearly all basins, high fitness solutions belong to at most one 
basin. The figure suggest the presence of a phase transition, in which the threshold of 
the transition is lower for high K than for low K. This suggests that the structure of the 
f-LON network for solutions with high fitness, resembles that of the b-LON, whereas 
the topology is different with respect to solutions with low fitness. 
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Fig. 5. (a) Average number of basins to which a solution belongs, (b) For N = 16 and 3 
selected values of K, the number of basins per solution according to the solution fitness 
value. Averages on 30 independent landscapes. 



4 Discussion 



We have extended the recently proposed Local Optima Network (LON) model to ana- 
lyze the structural differences between first and best improvement local search, in terms 
of the local optima network connectivity and the nature of the corresponding basins of 
attraction. The results of the analysis, on a set of NK landscapes can be summarized 
as follows. The impact of landscape ruggedness (K value) on the network features is 
similar for both models. First-improvement induces a densely connected network (in- 
deed a complete network), while this is not the case on the best-improvement model. 
However, many of the edges in the f-LON networks have very low weights. In par- 
ticular, the self-connections (i.e. the probabilities of remaining in the same basin after 
a bit-flip mutation), are much smaller in the f-LON than in the b-LON model, which 
suggests that escaping from local optima would be easier for a first-improvement lo- 
cal searcher. The path lengths between local optima, and between any optima and the 
global optimum, are generally larger in f-LON than in b-LON networks. We must con- 
sider, however, that the number of evaluations needed to explore a basin, would be N 
times lower for first-improvement than for best-improvement. We, therefore, suggest 
that first-improvement is a better heuristic for exploring NK landscapes. Our prelimi- 
nary empirical results support this insight, a detailed account of them will be presented 
elsewhere due to space restrictions. Most of our work on the local optima model has 
been based on binary spaces and NK landscapes. However, we have recently started the 
exploration of permutation search spaces, specifically the Quadratic Assignment Prob- 
lem (QAP) |2|, which opens up the possibility of analyzing other permutation based 
problems such as the traveling salesman and the permutation flow shop problems. Our 
current definition of transition probabilities, although very informative, produces highly 
connected networks, which are not easy to study. Therefore, we are currently consid- 
ering alternative definitions and threshold values for the connectivity. Finally, although 
the local optima network model is still under development, we argue that it offers an 
alternative view of combinatorial fitness landscapes, which can potentially contribute 
to both our understanding of problem difficulty, and the design of effective heuristic 
search algorithms. 
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